Kaolack
WolBanking77: Wolof Banking Speech Intent Classification Dataset
Kandji, Abdou Karim, Precioso, Frédéric, Ba, Cheikh, Ndiaye, Samba, Ndione, Augustin
Intent classification models have made a significant progress in recent years. However, previous studies primarily focus on high-resource language datasets, which results in a gap for low-resource languages and for regions with high rates of illiteracy, where languages are more spoken than read or written. This is the case in Senegal, for example, where Wolof is spoken by around 90\% of the population, while the national illiteracy rate remains at of 42\%. Wolof is actually spoken by more than 10 million people in West African region. To address these limitations, we introduce the Wolof Banking Speech Intent Classification Dataset (WolBanking77), for academic research in intent classification. WolBanking77 currently contains 9,791 text sentences in the banking domain and more than 4 hours of spoken sentences. Experiments on various baselines are conducted in this work, including text and voice state-of-the-art models. The results are very promising on this current dataset. In addition, this paper presents an in-depth examination of the dataset's contents. We report baseline F1-scores and word error rates metrics respectively on NLP and ASR models trained on WolBanking77 dataset and also comparisons between models. Dataset and code available at: https://github.com/abdoukarim/wolbanking77.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Africa > Senegal > Dakar Region > Dakar (0.04)
- (28 more...)
- Banking & Finance (1.00)
- Education (0.93)
- Law (0.92)
- (2 more...)
An Analysis of Multilingual FActScore
Vu, Kim Trong, Krumdick, Michael, Reddy, Varshini, Dernoncourt, Franck, Lai, Viet Dac
FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. Our evaluation shows that LLMs exhibit distinct behaviors in both fact extraction and fact scoring tasks. No LLM produces consistent and reliable FActScore across languages with varying levels of resources. We also find that the knowledge source plays an important role in the quality of the estimated FActScore. Using Wikipedia as the knowledge source may hinder the true FActScore of long-form text due to its limited coverage in medium- and low-resource languages. We also incorporate three mitigations to our knowledge source that ultimately improve FActScore estimation across all languages.
- Asia > Afghanistan (0.14)
- South America > Brazil > São Paulo (0.04)
- North America > United States > Florida > Miami-Dade County > Miami Beach (0.04)
- (16 more...)
- Media (1.00)
- Government (1.00)
- Leisure & Entertainment > Sports > Soccer (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal
Gauthier, Elodie, Ndiaye, Aminata, Guissé, Abdoulaye
This work is part of the Kallaama project, whose objective is to produce and disseminate national languages corpora for speech technologies developments, in the field of agriculture. Except for Wolof, which benefits from some language data for natural language processing, national languages of Senegal are largely ignored by language technology providers. However, such technologies are keys to the protection, promotion and teaching of these languages. Kallaama focuses on the 3 main spoken languages by Senegalese people: Wolof, Pulaar and Sereer. These languages are widely spoken by the population, with around 10 million of native Senegalese speakers, not to mention those outside the country. However, they remain under-resourced in terms of machine-readable data that can be used for automatic processing and language technologies, all the more so in the agricultural sector. We release a transcribed speech dataset containing 125 hours of recordings, about agriculture, in each of the above-mentioned languages. These resources are specifically designed for Automatic Speech Recognition purpose, including traditional approaches. To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49,132 entries from the Wolof dataset.
- Africa > Senegal > Dakar Region > Dakar (0.05)
- Africa > Niger (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (15 more...)
- Banking & Finance (0.93)
- Food & Agriculture > Agriculture (0.92)
- Media (0.69)
This AI Expert From Senegal Is Helping Showcase Africans In STEM
Not only has Adji Bousso Dieng, an AI researcher from Senegal, contributed to the field of generative modeling and about to become one of the first black female faculty in Computer Science in the Ivy League, she is also helping Africans in STEM tell their own success stories. Dieng, who is currently a researcher at Google and an incoming computer science faculty at Princeton, works in an area of Artificial Intelligence called generative modeling. "It allows you to learn from data without needing any supervision," she said, "Generative models have many real-world applications with regard to natural language processing, computer vision, healthcare, robotics, and in a range of sciences." In addition to this, Dieng started The Africa I Know (TAIK), a platform that showcases Africans who've had successful careers; highlight how Africans are leveraging technology to solve developmental problems –in agriculture, health and education– and narrate African history as told by Africans. "I founded TAIK to unearth the success stories of Africa and its people and to foster an economic and social consciousness in Africa," she said, adding that TAIK's volunteers are a group of eager and young Africans coming from every region of the continent and that the content is in both English and French.
- North America > United States (0.16)
- Africa > Senegal > Kaolack Region > Kaolack (0.08)
- Europe > France (0.07)
- Africa > South Africa (0.07)
- Health & Medicine > Therapeutic Area (0.61)
- Education > Educational Setting (0.51)
Visualization and machine learning for forecasting of COVID-19 in Senegal
Ndiaye, Babacar Mbaye, Balde, Mouhamadou A. M. T., Seck, Diaraf
In this article, we give visualization and different machine learning technics for two weeks and 40 days ahead forecast based on public data. On July 15, 2020, Senegal reopened its airspace doors, while the number of confirmed cases is still increasing. The population no longer respects hygiene measures, social distancing as at the beginning of the contamination. Negligence or tiredness to always wear the masks? We make forecasting on the inflection point and possible ending time.
- Africa > Senegal > Dakar Region > Dakar (0.06)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (9 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)